NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Network Estimation by Mixing: Adaptivity and More

https://doi.org/10.1080/01621459.2023.2252137

Li, Tianxi; Le, Can M (July 2024, Journal of the American Statistical Association)

Full Text Available
On the Feasibility and Benefits of Extensive Evaluation

https://doi.org/10.1145/3677137

Hui, Yujie; Yu, Miao; Qi, Hao; Gan, Yifan; Li, Tianxi; Li, Yuke; Ren, Xueyuan; Ma, Sixiang; Lu, Xiaoyi; Wang, Yang (October 2024, Proceedings of the ACM on Management of Data)

Benchmark and system parameters often have a significant impact on performance evaluation, which raises a long-lasting question about which settings we should use. This paper studies the feasibility and benefits of extensive evaluation. A full extensive evaluation, which tests all possible settings, is usually too expensive. This work investigates whether it is possible to sample a subset of the settings and, upon them, generate observations that match those from a full extensive evaluation. Towards this goal, we have explored the incremental sampling approach, which starts by measuring a small subset of random settings, builds a prediction model on these samples using the popular ANOVA approach, adds more samples if the model is not accurate enough, and terminates otherwise. To summarize our findings: 1) Enhancing a research prototype to support extensive evaluation mostly involves changing hard-coded configurations, which does not take much effort. 2) Some systems are highly predictable, which means that they can achieve accurate predictions with a low sampling rate, but some systems are less predictable. 3) We have not found a method that can consistently outperform random sampling + ANOVA. Based on these findings, we provide recommendations to improve artifact predictability and strategies for selecting parameter values during evaluation.
more » « less
Full Text Available
On the Feasibility and Benefits of Extensive Evaluation

Hui, Yujie; Yu, Miao; Qi, Hao; Gan, Yifan; Li, Tianxi; Li, Yuke; Ren, Xueyuan; Ma, Sixiang; Lu, Xiaoyi; Wang, Yang (August 2024, ACM SIGMOD'25)

Full Text Available
Community models for networks observed through edge nominations

Li, Tianxi; Levina, Elizaveta; Zhu, Ji (October 2023, Journal of machine learning research)

Communities are a common and widely studied structure in networks, typically assum- ing that the network is fully and correctly observed. In practice, network data are often collected by querying nodes about their connections. In some settings, all edges of a sam- pled node will be recorded, and in others, a node may be asked to name its connections. These sampling mechanisms introduce noise and bias, which can obscure the community structure and invalidate assumptions underlying standard community detection methods. We propose a general model for a class of network sampling mechanisms based on recording edges via querying nodes, designed to improve community detection for network data col- lected in this fashion. We model edge sampling probabilities as a function of both individual preferences and community parameters, and show community detection can be performed by spectral clustering under this general class of models. We also propose, as a special case of the general framework, a parametric model for directed networks we call the nomination stochastic block model, which allows for meaningful parameter interpretations and can be fitted by the method of moments. In this case, spectral clustering and the method of mo- ments are computationally ecient and come with theoretical guarantees of consistency. We evaluate the proposed model in simulation studies on unweighted and weighted net- works and under misspecified models. The method is applied to a faculty hiring dataset, discovering a meaningful hierarchy of communities among US business schools.
more » « less
Full Text Available
On the Discontinuation of Persistent Memory: Looking Back to Look Forward

Li, Tianxi; Wang, Yang; Lu, Xiaoyi (June 2023, Workshop on Hot Topics in System Infrastructure June 18, 2023, Orlando, Florida, USA Co-located with ISCA 2023)

Full Text Available
Informative core identification in complex networks

https://doi.org/10.1093/jrsssb/qkac009

Miao, Ruizhong; Li, Tianxi (January 2023, Journal of the Royal Statistical Society Series B: Statistical Methodology)

Abstract In a complex network, the core component with interesting structures is usually hidden within noninformative connections. The noises and bias introduced by the noninformative component can obscure the salient structure and limit many network modeling procedures’ effectiveness. This paper introduces a novel core–periphery model for the noninformative periphery structure of networks without imposing a specific form of the core. We propose spectral algorithms for core identification for general downstream network analysis tasks under the model. The algorithms enjoy strong performance guarantees and are scalable for large networks. We evaluate the methods by extensive simulation studies demonstrating advantages over multiple traditional core–periphery methods. The methods are also used to extract the core structure from a citation network, which results in a more interpretable hierarchical community detection.
more » « less
Full Text Available
Fitting low-rank models on egocentrically sampled partial networks

Chan, Angus G; Li, Tianxi (January 2023, Proceedings of Machine Learning Research)

Full Text Available
Linear Regression and Its Inference on Noisy Network-Linked Data

https://doi.org/10.1111/rssb.12554

Le, Can M.; Li, Tianxi (November 2022, Journal of the Royal Statistical Society Series B: Statistical Methodology)

Abstract Linear regression on network-linked observations has been an essential tool in modelling the relationship between response and covariates with additional network structures. Previous methods either lack inference tools or rely on restrictive assumptions on social effects and usually assume that networks are observed without errors. This paper proposes a regression model with non-parametric network effects. The model does not assume that the relational data or network structure is exactly observed and can be provably robust to network perturbations. Asymptotic inference framework is established under a general requirement of the network observational errors, and the robustness of this method is studied in the specific setting when the errors come from random network models. We discover a phase-transition phenomenon of the inference validity concerning the network density when no prior knowledge of the network model is available while also showing a significant improvement achieved by knowing the network model. Simulation studies are conducted to verify these theoretical results and demonstrate the advantage of the proposed method over existing work in terms of accuracy and computational efficiency under different data-generating models. The method is then applied to middle school students' network data to study the effectiveness of educational workshops in reducing school conflicts.
more » « less
Full Text Available
Link Prediction for Egocentrically Sampled Networks

https://doi.org/10.1080/10618600.2022.2163648

Li, Tianxi; Wu, Yun-Jhong; Levina, Elizaveta; Zhu, Ji (January 2023, Journal of computational and graphical statistics)

Full Text Available
A Study of Database Performance Sensitivity to Experiment Settings.

https://doi.org/10.14778/3523210.3523221

Wang, Yang; Yu, Miao; Hui, Yujie; Zhou, Fang; Huang, Yuyang; Zhu, Rui; Ren, Xueyuan; Li, Tianxi; Lu, Xiaoyi. (September 2022, Proceedings of the VLDB Endowment)

Full Text Available

« Prev Next »

Search for: All records